Search CORE

290 research outputs found

NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

Author: Maglott Donna R.
Pruitt Kim D.
Tatusova Tatiana
Publication venue: Oxford University Press
Publication date: 27/11/2006
Field of study

NCBI's reference sequence (RefSeq) database () is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. The database includes 3774 organisms spanning prokaryotes, eukaryotes and viruses, and has records for 2 879 860 proteins (RefSeq release 19). RefSeq records integrate information from multiple sources, when additional data are available from those sources and therefore represent a current description of the sequence and its features. Annotations include coding regions, conserved domains, tRNAs, sequence tagged sites (STS), variation, references, gene and protein product names, and database cross-references. Sequence is reviewed and features are added using a combined approach of collaboration and other input from the scientific community, prediction, propagation from GenBank and curation by NCBI staff. The format of all RefSeq records is validated, and an increasing number of tests are being applied to evaluate the quality of sequence and annotation, especially in the context of complete genomic sequence

Crossref

PubMed Central

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

Author: Church
D. R. Maglott
Dalgleish
G. R. Brown
K. D. Pruitt
Petersen
Prakash
Pruitt
Sherry
T. Tatusova
Publication venue: Oxford University Press
Publication date
Field of study

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 × 106 genomic records, 13 × 106 proteins and 2 × 106 RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/)

Crossref

PubMed Central

NCBI Reference Sequences: current status, policy and new initiatives

Author: Altschul
Altschul
D. R. Maglott
Eddy
Griffiths-Jones
Gulley
K. D. Pruitt
Lowe
Maquat
Schuler
T. Tatusova
W. Klimke
Publication venue: Oxford University Press
Publication date
Field of study

NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 × 106 proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system

Crossref

PubMed Central

Human immunodeficiency virus type 1, human protein interaction database at NCBI

Author: Alfarano
Ashburner
B. E. Sanders-Beer
Cohen
D. R. Maglott
Gayle
K. D. Pruitt
K. S. Katz
Kuiken
R. G. Ptak
Salwinski
W. Fu
Publication venue: Oxford University Press
Publication date
Field of study

The ‘Human Immunodeficiency Virus Type 1 (HIV-1), Human Protein Interaction Database’, available through the National Library of Medicine at www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions, was created to catalog all interactions between HIV-1 and human proteins published in the peer-reviewed literature. The database serves the scientific community exploring the discovery of novel HIV vaccine candidates and therapeutic targets. To facilitate this discovery approach, the following information for each HIV-1 human protein interaction is provided and can be retrieved without restriction by web-based downloads and ftp protocols: Reference Sequence (RefSeq) protein accession numbers, Entrez Gene identification numbers, brief descriptions of the interactions, searchable keywords for interactions and PubMed identification numbers (PMIDs) of journal articles describing the interactions. Currently, 2589 unique HIV-1 to human protein interactions and 5135 brief descriptions of the interactions, with a total of 14 312 PMID references to the original articles reporting the interactions, are stored in this growing database. In addition, all protein–protein interactions documented in the database are integrated into Entrez Gene records and listed in the ‘HIV-1 protein interactions’ section of Entrez Gene reports. The database is also tightly linked to other databases through Entrez Gene, enabling users to search for an abundance of information related to HIV pathogenesis and replication

Crossref

PubMed Central

Automatic Assignment of EC Numbers

Author: A McNaught
AJ Barrett
CH Wu
D Latino
D Maglott
Dietmar Schomburg
H Ma
Herbert M. Sauro
I Schomburg
Ida Schomburg
J Apostolakis
JS Edwards
M Hattori
M Kotera
NM Luscombe
P Willet
R Caspi
R Körner
S Goto
S Schmidt
TJ Hubbard
Volker Egelhofer
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

A wide range of research areas in molecular biology and medical biochemistry require a reliable enzyme classification system, e.g., drug design, metabolic network reconstruction and system biology. When research scientists in the above mentioned areas wish to unambiguously refer to an enzyme and its function, the EC number introduced by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) is used. However, each and every one of these applications is critically dependent upon the consistency and reliability of the underlying data for success. We have developed tools for the validation of the EC number classification scheme. In this paper, we present validated data of 3788 enzymatic reactions including 229 sub-subclasses of the EC classification system. Over 80% agreement was found between our assignment and the EC classification. For 61 (i.e., only 2.5%) reactions we found that their assignment was inconsistent with the rules of the nomenclature committee; they have to be transferred to other sub-subclasses. We demonstrate that our validation results can be used to initiate corrections and improvements to the EC number classification scheme

Crossref

Directory of Open Access Journals

PubMed Central

A gene signature for post-infectious chronic fatigue syndrome

Author: Abhijit Chaudhuri
AK Smith
B Cameron
B Evengård
Celia Cannon
D Buchwald
D Maglott
DR Shafren
G Broderick
G Kennedy
G Kennedy
H Fang
H Gräns
IB Jeffery
J Gu
J Smith
J Vecchiet
John W Gow
JR Kerr
JR Kerr
JW Gow
JW Gow
K Fukuda
KA Kurian
L Carmel
L Jason
LA Karaivanova
LJ Morrison
LM Cope
M Maes
M Maes
M Steinau
N Kaushik
Pawel Herzyk
Peter O Behan
PJ McLaren
R Breitling
R Breitling
R Edgar
RA Irizarry
RB Moss
RS Richards
RW Finberg
S Hafenstein
S Mikami
S Wessely
S Wessely
SD Vernon
SD Vernon
Suzanne Hagan
T Pham
T Saiki
T Watanabe
T Whistler
Y Manuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: At present, there are no clinically reliable disease markers for chronic fatigue syndrome. DNA chip microarray technology provides a method for examining the differential expression of mRNA from a large number of genes. Our hypothesis was that a gene expression signature, generated by microarray assays, could help identify genes which are dysregulated in patients with post-infectious CFS and so help identify biomarkers for the condition. Methods: Human genome-wide Affymetrix GeneChip arrays (39,000 transcripts derived from 33,000 gene sequences) were used to compare the levels of gene expression in the peripheral blood mononuclear cells of male patients with post-infectious chronic fatigue (n = 8) and male healthy control subjects (n = 7). Results: Patients and healthy subjects differed significantly in the level of expression of 366 genes. Analysis of the differentially expressed genes indicated functional implications in immune modulation, oxidative stress and apoptosis. Prototype biomarkers were identified on the basis of differential levels of gene expression and possible biological significance Conclusion: Differential expression of key genes identified in this study offer an insight into the possible mechanism of chronic fatigue following infection. The representative biomarkers identified in this research appear promising as potential biomarkers for diagnosis and treatment

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Enlighten

ResearchOnline@GCU

Identifying differential correlation in gene/pathway combinations

Author: A Bhattacharjee
A Subramanian
A Szabo
A Szabo
B Bolstad
D Kostka
D Maglott
D Singh
DR Rhodes
G Parmigiani
Giovanni Parmigiani
I Dinu
I Jolliffe
K Mirnics
K Yeung
KC Li
KC Li
Leslie Cope
M Dettling
M Kanehisa
M Safran
R Curtis
R Development Core Team
R Gentleman
R Huang
R Huang
Rosemary Braun
S Taga
W Pan
Y Huang
Y Xiao
YP Yu
YY Ho
Z Jiang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background An important emerging trend in the analysis of microarray data is to incorporate known pathway information a priori. Expression level "summaries" for pathways, obtained from the expression data for the genes constituting the pathway, permit the inclusion of pathway information, reduce the high dimensionality of microarray data, and have the power to elucidate gene-interaction dependencies which are not already accounted for through known pathway identification. Results We present a novel method for the analysis of microarray data that identifies joint differential expression in gene-pathway pairs. This method takes advantage of known gene pathway memberships to compute a summary expression level for each pathway as a whole. Correlations between the pathway expression summary and the expression levels of genes not already known to be associated with the pathway provide clues to gene interaction dependencies that are not already accounted for through known pathway identification, and statistically significant differences between gene-pathway correlations in phenotypically different cells (e.g., where the expression level of a single gene and a given pathway summary correlate strongly in normal cells but weakly in tumor cells) may indicate biologically relevant gene-pathway interactions. Here, we detail the methodology and present the results of this method applied to two gene-expression datasets, identifying gene-pathway pairs which exhibit differential joint expression by phenotype. Conclusion The method described herein provides a means by which interactions between large numbers of genes may be identified by incorporating known pathway information to reduce the dimensionality of gene interactions. The method is efficient and easily applied to data sets of ~102 arrays. Application of this method to two publicly-available cancer data sets yields suggestive and promising results. This method has the potential to complement gene-at-a-time analysis techniques for microarray analysis by indicating relationships between pathways and genes that have not previously been identified and which may play a role in disease.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Advanced Genomic Data Mining

Author: A Kasprzyk
B Giardine
D Diez
D Hull
D Karolchik
D Maglott
E Birney
E Segal
Ewan Birney
Fran Lewitter
G Alonso
I Vastrik
J Pratap
J Taylor
JL Ashurst
KD Pruitt
LA Davidson
M Ashburner
MB Eisen
N Chen
N de la Cruz
P Jaiswal
P Rice
R Ihaka
R Ramakrishnan
RA Becker
RC Gentleman
RD Dowell
RM Kuhn
SN Twigger
ST Sherry
TJ Hubbard
TR Golub
TW Harris
Xosé M. Fernández-Suárez
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

As data banks increase their size, one of the current challenges in bioinformatics is to be able to query them in a sensible way. Information is contained in differen

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Database resources of the National Center for Biotechnology Information

Author: A. Souvorov
Altschul
Altschul
Blumenfeld
D. A. Benson
D. J. Lipman
D. L. Wheeler
D. Landsman
D. M. Church
D. R. Maglott
E. Sequeira
E. Yaschenko
Ermolaeva
Fung
G. D. Schuler
G. Starchenko
Geer
Ghedin
J. Ostell
K. Canese
K. D. Pruitt
K. Sirotkin
L. Wagner
L. Y. Geer
M. DiCuccio
M. Feolo
M. Shumway
Ma
Manolio
Needleman
O. Khovayko
R. Edgar
R. L. Tatusov
S. Federhen
S. H. Bryant
S. T. Sherry
Schuler
Schuler
Sewell
Sherry
T. A. Tatusova
T. Barrett
T. L. Madden
Tatusov
Tatusova
Tatusova
V. Chetvernin
V. Miller
W. Helmberg
Wang
Y. Kapustin
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

In addition to maintaining the GenBank(®) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link(BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace and Assembly Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Viral Genotyping Tools, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at

Crossref

PubMed Central

Comparison study of microarray meta-analysis methods

Author: A Ramasamy
AA Alizadeh
AC Fierro
Anna Campain
AV Ivshina
D Maglott
DR Rhodes
F Hong
G Parmigiani
G Parmigiani
GK Smyth TNP
I Lonnstedt
J Steven
JK Choi
M Ritchie
MA Shipp
O Larsson
P Cahan
P Farmer
P Warnat
R Bosotti
R Breitling
R DerSimonian
R Gentleman
R Grützmann
R Guerra
RA Fisher
RA Irizarry
S Dudoit
S Dudoit
S Loi
S Lu
SL Normand
TL Fare
VG Tusher
WE Johnson
Yee Hwa Yang
YH Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Meta-analysis methods exist for combining multiple microarray datasets. However, there are a wide range of issues associated with microarray meta-analysis and a limited ability to compare the performance of different meta-analysis methods. Results We compare eight meta-analysis methods, five existing methods, two naive methods and a novel approach (mDEDS). Comparisons are performed using simulated data and two biological case studies with varying degrees of meta-analysis complexity. The performance of meta-analysis methods is assessed via ROC curves and prediction accuracy where applicable. Conclusions Existing meta-analysis methods vary in their ability to perform successful meta-analysis. This success is very dependent on the complexity of the data and type of analysis. Our proposed method, mDEDS, performs competitively as a meta-analysis tool even as complexity increases. Because of the varying abilities of compared meta-analysis methods, care should be taken when considering the meta-analysis method used for particular research.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central